12 research outputs found

    Automated credit assessment framework using ETL process and machine learning

    Get PDF
    In the current business scenario, real-time analysis of enterprise data through Business Intelligence (BI) is crucial for supporting operational activities and taking any strategic decision. The automated ETL (extraction, transformation, and load) process ensures data ingestion into the data warehouse in near real-time, and insights are generated through the BI process based on real-time data. In this paper, we have concentrated on automated credit risk assessment in the financial domain based on the machine learning approach. The machine learning-based classification techniques can furnish a self-regulating process to categorize data. Establishing an automated credit decision-making system helps the lending institution to manage the risks, increase operational efficiency and comply with regulators. In this paper, an empirical approach is taken for credit risk assessment using logistic regression and neural network classification method in compliance with Basel II standards. Here, Basel II standards are adopted to calculate the expected loss. The required data integration for building machine learning models is done through an automated ETL process. We have concluded this research work by evaluating this new methodology for credit risk assessment

    Seasonal variation of soil enzymes in areas of fluoride stress in Birbhum District, West Bengal, India

    Get PDF
    AbstractSoil enzyme activities provide a unique biochemical means for assessing soil function as an indicator of soil fertility, which can be altered by a profusion of fluoride in the soil and seasonal changes. Seven sites were chosen in the fluoride-affected area of Nasipur, Birbhum District, West Bengal, India, to compare seasonal changes in enzymes (urease, amylase, cellulase and invertase), fluoride content, physicochemical characteristics and the availability of microbes in the soil with a control. The activity of all the enzymes varied with season. Urease had greater activity in the summer, followed by winter; it showed marginal differences from the control area during the winter (p<0.002) and summer (p<0.110) but a significant (p<0.000) difference during the rainy season. Soil pH had a negative impact on urease activity during both winter and summer. Cellulase activity was accelerated by the organic matter and organic carbon content of the soil. Fluoride therefore had the greatest activity against urease activity during the rainy, summer and winter seasons. The microbial population of the soil also showed a negative impact of fluoride, which may in turn affect the soil enzymes and characteristics

    Algorithms for data mining and bio-informatics

    No full text
    L'extraction de règles d'association et de bi-clusters sont deux techniques de fouille de données complémentaires majeures, notamment pour l'intégration de connaissances. Ces techniques sont utilisées dans de nombreux domaines, mais aucune approche permettant de les unifier n'a été proposée. Hors, réaliser ces extractions indépendamment pose les problèmes des ressources nécessaires (mémoire, temps d'exécution et accès aux données) et de l'unification des résultats. Nous proposons une approche originale pour extraire différentes catégories de modèles de connaissances tout en utilisant un minimum de ressources. Cette approche est basée sur la théorie des ensembles fermés et utilise une nouvelle structure de données pour extraire des représentations conceptuelles minimales de règles d'association, bi-clusters et règles de classification. Ces modèles étendent les règles d'association et de classification et les bi-clusters classiques, les listes d'objets supportant chaque modèle et les relations hiérarchiques entre modèles étant également extraits. Cette approche a été appliquée pour l'analyse de données d'interaction protéomiques entre le virus VIH-1 et l'homme. L'analyse de ces interactions entre espèces est un défi majeur récent en bio-informatique. Plusieurs bases de données intégrant des informations hétérogènes sur les interactions et des connaissances biologiques sur les protéines ont été construites. Les résultats expérimentaux montrent que l'approche proposée peut traiter efficacement ces bases de données et que les modèles conceptuels extraits peuvent aider à la compréhension et à l'analyse de la nature des relations entre les protéines interagissant.Knowledge pattern extraction is one of the major topics in the data mining and background knowledge integration domains. Out of several data mining techniques, association rule mining and bi-clustering are two major complementary tasks for these topics. These tasks gained much importance in many domains in recent years. However, no approach was proposed to perform them in one process. This poses the problems of resources required (memory, execution times and data accesses) to perform independent extractions and of the unification of the different results. We propose an original approach for extracting different categories of knowledge patterns while using minimum resources. This approach is based on the frequent closed patterns theoretical framework and uses a novel suffix-tree based data structure to extract conceptual minimal representations of association rules, bi-clusters and classification rules. These patterns extend the classical frameworks of association and classification rules, and bi-clusters as data objects supporting each pattern and hierarchical relationships between patterns are also extracted. This approach was applied to the analysis of HIV-1 and human protein-protein interaction data. Analyzing such inter-species protein interactions is a recent major challenge in computational biology. Databases integrating heterogeneous interaction information and biological background knowledge on proteins have been constructed. Experimental results show that the proposed approach can efficiently process these databases and that extracted conceptual patterns can help the understanding and analysis of the nature of relationships between interacting proteins

    Algorithmes pour la Fouille de Données et la Bioinformatique

    No full text
    Pattern extraction is one of the major topics in the Knowledge Discovery from Data (KDD) and Background Knowledge Integration (BKI) research domains. Extracting patterns from databases, data warehouses and other kinds of data repositories is one of the most unyielding tasks. Extensively, it is subsumed as a part of the data mining task. Out of numerous data mining techniques, association rule mining and bi-clustering are two major complementary data mining tasks for relevant knowledge extraction and integration. These tasks gained much importance in many research domains in recent years. However, to the best of our knowledge, no approach was proposed to perform these two tasks in one process. In this thesis work, we propose an original approach for extracting different categories of knowledge patterns while using minimum number of resources. These patterns, based on frequent closed sets and supporting object lists, are used to construct conceptual minimal representations of association rules, bi-clusters and classification rules. They extend the classical frameworks of association and classification rules, and of bi-clusters, by providing the user with more information using the object lists associated with these patterns. These patterns are generated from the sets of generators, or key-patterns, the sets of closed patterns and the hierarchical conceptual structure induced from generators, closed patterns and supporting object lists. The proposed approach, named FIST for Frequent Itemset mining using Suffix-Trees, is based on a new suffix-tree data structure that enables the efficient storage of data and computation of relevant patterns in primary memory. The strategy used by FIST is based on the closure of the Galois Connection of a finite binary relation theory used in the Formal Concept Analysis framework. FIST is an integrated approach based on the Galois closure framework, combining the searches for generators, frequent closed itemsets, association rules, conceptual bi-clusters and classification patterns, and extending the generated patterns for conceptual analysis. Experimental results and analyses show the performances of the different versions of FISTand compare them to others state-of-the-art algorithms for association rule mining, closed pattern mining and bi-clustering. To the best of our knowledge, no algorithm in the literature produces the same output patterns as are generated by FIST. The FIST application was applied for theanalysis of a real life dataset of protein-protein interactions (PPI) between HIV-1 and Human proteins. In order to improve and extend knowledge patterns extracted from original HIV-1 and Human PPI data, we constructed three new datasets integrating the most recent biological and bibliographic annotations on proteins with PPI data. Successive experimental results for these PPI datasets, and new information discovered using the FIST approach on these datasets, are presented in this report. As proof of correctness, we have also shown that FIST successfully found the currently known information in the PPI literature. The experiments on these PPI datasets were performed by extracting with FIST the conceptual hierarchical bi-clusters and the conceptual minimal covers of association rules containing both interaction and annotation information on proteins.Dans cette thèse, nous proposons une approche originale pour l’extraction de modèles de connaissances de ces deux catégories en minimisant l’utilisation des ressources. Les modèles extraits, basés sur la théories des itemsets fermés fréquents et des listes d’objets support, sont utilisés pour construire des représentations conceptuelles minimales de règles d’association et de classification, et de bi-clusters. Ils étendent les modèles classiques de règles d’association et classification, ainsi que de bi-clusters, en fournissant à l’utilisateur davantage d’informations découlant des listes d’objets supportant chaque modèle. Ces modèles sont générés à partir des ensembles de générateurs, ou itemsets-clé, d’itemsets fermés fréquents et de la structure hiérarchique conceptuelle induite par les générateurs, les fermés fréquents et les listes d’objets support. L’approche proposée, nommée FIST pour Frequent Itemset mining using Suffix-Trees, utilise une nouvelle structure de données basée sur les arbres suffixés qui permet le stockage efficace des données et l’extraction de modèles de connaissance pertinents en mémoire primaire. La stratégie utilisée par FIST est basée sur la fermeture de la connexion de Galois d’une relation binaire finie qui sert également de fondement théorique en analyse de concepts formels (FCA). FIST est une approche intégrée qui combine l’extraction de générateurs, motifs fermés fréquents, règles d’association, de classification, et bi-clusters conceptuels, étendant ainsi les modèles classiques de connaissance pour une analyse conceptuelle. Aucun autre algorithme publié dans la littérature ne permet de générer les mêmes motifs que ceux générés par FIST à notre connaissance. Trois implémentations des deux différentes versions algorithmiques de FIST ont été implémentées en langage Java, choisi pour la portabilité. Ces trois implémentations ont été comparées expérimentalement sur diverses configurations matérielles afin d’évaluer avec précision les gains obtenus par les améliorations successives de l’algorithme et l’utilisation des collections de l’API Java Trove. L’approche FIST, développée pour répondre à certaines de ces problématiques, a été appliquées à l’analyse d’interactions protéomiques (PPI) entre les protéines du virus VIH-1 et de l’organisme humain. L’analyse d’interactions protéomiques est un domaine récent et complexe d’une importance majeure en bioinformatique. Les résultatsobtenus ont permi de démontrer son rôle capital pour la découverte de nouveaux traitements et la prévention de diverses types de maladies. Afin de démontrer la validité de l’approche, les connaissances déjà reportées dans la littérature du domaine qui ont été extraites avec FIST sont également présentées. Les modèles extraits par FIST pour ces données sont constitutées des bi-clusters hiérarchiques conceptuels et des couvertures minimales conceptuelles de règles d’association contenant à la fois des informations d’interactions et d’annotations biologiques conernant les protéines

    Algorithms for data mining and bio-informatics

    No full text
    L'extraction de règles d'association et de bi-clusters sont deux techniques de fouille de données complémentaires majeures, notamment pour l'intégration de connaissances. Ces techniques sont utilisées dans de nombreux domaines, mais aucune approche permettant de les unifier n'a été proposée. Hors, réaliser ces extractions indépendamment pose les problèmes des ressources nécessaires (mémoire, temps d'exécution et accès aux données) et de l'unification des résultats. Nous proposons une approche originale pour extraire différentes catégories de modèles de connaissances tout en utilisant un minimum de ressources. Cette approche est basée sur la théorie des ensembles fermés et utilise une nouvelle structure de données pour extraire des représentations conceptuelles minimales de règles d'association, bi-clusters et règles de classification. Ces modèles étendent les règles d'association et de classification et les bi-clusters classiques, les listes d'objets supportant chaque modèle et les relations hiérarchiques entre modèles étant également extraits. Cette approche a été appliquée pour l'analyse de données d'interaction protéomiques entre le virus VIH-1 et l'homme. L'analyse de ces interactions entre espèces est un défi majeur récent en bio-informatique. Plusieurs bases de données intégrant des informations hétérogènes sur les interactions et des connaissances biologiques sur les protéines ont été construites. Les résultats expérimentaux montrent que l'approche proposée peut traiter efficacement ces bases de données et que les modèles conceptuels extraits peuvent aider à la compréhension et à l'analyse de la nature des relations entre les protéines interagissant.Knowledge pattern extraction is one of the major topics in the data mining and background knowledge integration domains. Out of several data mining techniques, association rule mining and bi-clustering are two major complementary tasks for these topics. These tasks gained much importance in many domains in recent years. However, no approach was proposed to perform them in one process. This poses the problems of resources required (memory, execution times and data accesses) to perform independent extractions and of the unification of the different results. We propose an original approach for extracting different categories of knowledge patterns while using minimum resources. This approach is based on the frequent closed patterns theoretical framework and uses a novel suffix-tree based data structure to extract conceptual minimal representations of association rules, bi-clusters and classification rules. These patterns extend the classical frameworks of association and classification rules, and bi-clusters as data objects supporting each pattern and hierarchical relationships between patterns are also extracted. This approach was applied to the analysis of HIV-1 and human protein-protein interaction data. Analyzing such inter-species protein interactions is a recent major challenge in computational biology. Databases integrating heterogeneous interaction information and biological background knowledge on proteins have been constructed. Experimental results show that the proposed approach can efficiently process these databases and that extracted conceptual patterns can help the understanding and analysis of the nature of relationships between interacting proteins.NICE-Bibliotheque electronique (060889901) / SudocSudocFranceF

    Galois Closure Based Association Rule Mining from Biological Data

    No full text
    International audienc

    MOSCFRA: A Multi-objective Genetic Approach for Simultaneous Clustering and Gene Ranking

    No full text
    International audienceMicroarray experiments generate a large amount of data which is used to discover the genetic background of diseases and to know the gene characteristics. Clustering the tissue samples is an important tool for partitioning the dataset according to co-expression patterns. This clustering task is even more difficult when we try to find the rank of each gene (Gene Ranking) according to their abilities to distinguish different classes of samples. Finding clusters for samples and rank of each gene for a specific gene expression data in a single process is always better. In the literature many algorithms are available for finding the clusters and gene ranking or selection separately. A few algorithms for simultaneous clustering and feature selection are also available. In this article, we propose a new approach to cluster the samples and rank the genes, simultaneously. A novel encoding technique is proposed here for the problem of simultaneous clustering and ranking. Results have been demonstrated for both artificial and real-life gene expression data sets

    Defect-Dicubane Ni(2)Ln(2) (Ln = Dy, Tb) Single Molecule Magnets

    No full text
    International audienceTwo pairs of Ni2Dy2 and Ni2Tb2 complexes, [Ni(2)Ln(2)(L)(4)(NO3)(2)(DMF)(2)] {Ln = Dy (1), Tb (2)} and [Ni(2)Ln(2)(L)(4)(NO3)(2)(MeOH)(2)]center dot 3MeOH {Ln = Dy (3), Tb (4)) (H2L is the Schiff base resulting from the condensation of o-vanillin and 2-aminophenol) possessing a defect-dicubane core topology were synthesized and characterized. All four complexes are ferromagnetically coupled, and the two Dy-analogues are found to be Single Molecule Magnets (SMMs) with energy barriers in the range 18-28 K Compound 1 displays step-like hysteresis loops, confirming the SMM behavior. Although 1 and 3 show very similar structural topologies, the dynamic properties of 1 and 3 are different with blocking temperatures (3.2 and 4.2 K at a frequency of 1500 Hz) differing by 1 K. This appears to result from a change in orientation of the nitrate ligands on the Dy-III ions, induced by changes in ligands on Ni-II

    Predictors of severity and outcome and roles of intravenous thrombolysis and biomarkers in first ischemic stroke

    No full text
    Aim: Stroke is one of the leading causes of death and disability. The proportion of patients receiving recombinant tissue plasminogen activator is low in our country. Biomarkers to identify patients at risk of severe disease, and guide treatment and prognosis would be valuable. This article aims to identify the factors that can independently prognosticate the acute phase of ischemic stroke.Methods: All patients with the first episode of ischemic stroke admitted to the Neurology Department between 1st December 2017 to 31st March 2018 were included in this pilot study. Stroke severity was evaluated using the National Institute of Health Stroke Scale (NIHSS). Patients being admitted within 4.5 h of onset of symptoms were thrombolysed with injection alteplase. For each patient, 4 serum biomarkers (D-dimer, fibrinogen, C-reactive protein and neuron specific enolase) were evaluated at admission and 24 h later. Discharged patients were assessed on an outpatient basis using the modified Rankin scale. The study primarily aimed to identify the factors predicting the severity and outcome of stroke, and to evaluate the effect of thrombolysis on the outcome. The secondary aim was to evaluate the role of biomarkers to predict the unfavorable outcome and the chance of post thrombolysis hemorrhage.Results: Out of 30 patients included in the study, 10 had NIHSS 0-4, 12 had NIHSS 5-15 and 8 had NIHSS 16-42. Sixteen patients had unfavorable outcome (mRS score ≥ 2), and 5 patients expired. Old age, history of diabetes, CHADS2 score ≥ 2, and total anterior circulation stroke (TACS) independently affected stroke severity, whereas low ejection fraction &lt; 35%, and TACS, independently predicted unfavorable outcome and mortality. High mean arterial blood pressure (MABP) and capillary blood glucose (CBG) at admission were significant predictors of stroke severity, unfavorable outcome, and mortality. Out of 10 thrombolysed patients, two had mRS score ≥ 2 and 3 had the post-thrombolysis hemorrhage. Thrombolysis significantly reduced the incidence of the unfavorable outcome, but did not significantly affect death. All the biomarker levels at admission were significantly higher among patients with severe stroke and those who subsequently had an unfavorable outcome. D-dimer levels significantly increased and fibrinogen level significantly decreased following thrombolysis. Higher MABP, CBG, and fibrinogen levels at admission predicted significantly higher chance to develop hemorrhagic complications post thrombolysis.Conclusion: Low ejection fraction, occurrence of TACS and the higher levels of the biomarkers under study predicted poor outcome. Higher mean CBG and MABP and raised fibrinogen levels predicted higher chance of post-thrombolysis hemorrhage

    Fluoride remediation using floating macrophytes

    No full text
    Six aquatic macrophytes, such as Pistia stratiotes, Ceratophyllum demersum, Nymphoides indica, Lemna major, Azolla pinnata,and Eichhornia crassipes were considered for remove fluoride from aqueous solution. Five different concentrations (10, 30, 50, and 100 ppm) of fluoride solution were taken in 1 L plastic container. Fixed weight (20 g) of macrophytes along with 500 ml fluoride solution was taken in each plastic container for 72 hours observation. Results demonstrated all the macrophytes show highest fluoride removal during 24 h to 48 h, but after 72 h their efficiency reduced drastically. The species N. indica showed better removal efficiency than other experimental macrophytes. In general, pigment measurement data indicated higher concentration at 72 h. However, Pistia sp. showed higher concentration of pigmentation at intermediate time interval (48 h). Higher level of dry weight to fresh weight ratio was recorded for L. major and A. pinnata at all concentrations, excepting at 10 ppm. In addition, all macrophytes showed lower RGR at higher concentration. Isotherm study indicated that macrophyte C. demersum is a good fitted with Freundlich and Langmuir isotherm whereas L. major with Langmuir isotherm during 24 hours
    corecore